Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD)

نویسندگان

  • Xiangying Jiang
  • Martin Ringwald
  • Judith A. Blake
  • Hagit Shatkay
چکیده

The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area. Database URL www.informatics.jax.org.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The mouse Gene Expression Database (GXD): 2017 update

The Gene Expression Database (GXD; http://www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental expression information. GXD collects different types of expression data from studies of wild-type and mutant mice, covering all developmental stages and including data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern...

متن کامل

GXD: a gene expression database for the laboratory mouse. The Gene Expression Database Group

The Gene Expression Database (GXD) is a community resource that stores and integrates expression information for the laboratory mouse, with a particular emphasis on mouse development, and makes these data freely available in formats appropriate for comprehensive analysis. GXD is implemented as a relational database and integrated with the Mouse Genome Database (MGD) to enable global analysis of...

متن کامل

The mouse Gene Expression Database (GXD): 2011 update

The Gene Expression Database (GXD) is a community resource of mouse developmental expression information. GXD integrates different types of expression data at the transcript and protein level and captures expression information from many different mouse strains and mutants. GXD places these data in the larger biological context through integration with other Mouse Genome Informatics (MGI) resou...

متن کامل

GXD: a Gene Expression Database for the laboratory mouse: current status and recent enhancements

The Gene Expression Database (GXD) is a community resource of gene expression information for the laboratory mouse. The database is designed as an open-ended system that can integrate different types of expression data. New expression data are made available on a daily basis. Thus, GXD provides increasingly complete information about what transcripts and proteins are produced by what genes; whe...

متن کامل

The mouse Gene Expression Database (GXD): updates and enhancements

The Gene Expression Database (GXD) is a community resource for gene expression information in the laboratory mouse. By collecting and integrating different types of expression data, GXD provides information about expression profiles in different mouse strains and mutants. Participation in the Gene Ontology (GO) project classifies genes and gene products with regard to molecular functions, biolo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2017  شماره 

صفحات  -

تاریخ انتشار 2017